Do predictor=2 differencing at sample width, not byte width by brendancol · Pull Request #1498 · xarray-contrib/xarray-spatial

brendancol · 2026-05-06T05:27:02Z

Summary

TIFF predictor=2 specifies differencing between adjacent same-component samples at the sample's natural bit width. The CPU and GPU code was doing it byte-wise, which drops the inter-byte carry for any sample wider than one byte.
uint8 was correct (one byte per sample). uint16, int16, uint32, int32 silently produced wrong pixel values.
Reading a uint16/int16/uint32/int32 predictor=2 TIFF written by libtiff, GDAL, rasterio, or tifffile gave garbage. Writing such a file from xrspatial produced output other libraries decoded as garbage. The internal xrspatial round-trip happened to work because both ends used the same broken arithmetic.
Fix: dispatch predictor_decode/predictor_encode on CPU and GPU to per-dtype kernels (uint8/16/32/64) that operate on a typed view of the byte buffer with stride = samples_per_pixel.
Caught by the deep-pass geotiff accuracy audit (follow-up to CPU fp_predictor_decode gives wrong pixels for multi-band predictor=3 TIFFs #1247). Existing tests only covered uint8 multi-sample for predictor=2.

Test plan

pytest xrspatial/geotiff/tests/test_predictor_multisample.py (33 passed including new GPU+CPU multi-byte cases)
pytest xrspatial/geotiff/tests/ (533 passed; pre-existing matplotlib recursion failures in test_features plotting tests are unrelated to this change and reproduce on main)
tifffile interop: write a uint16 predictor=2 file with xrspatial, read with tifffile, and vice versa; both directions match
GPU CuPy backend exercised on host with CUDA available

TIFF predictor=2 differences adjacent same-component samples at the sample's natural bit width. The CPU and GPU code was differencing byte-wise instead, which drops the inter-byte carry for samples wider than one byte. uint8 was fine because a sample fits in one byte. uint16, int16, uint32, int32 silently came out wrong. What broke: - Reading a uint16/int16/uint32/int32 TIFF written with predictor=2 by libtiff, GDAL, rasterio, or tifffile gave garbage (off by tens of LSBs for typical data). - Writing such a file from xrspatial produced output other libraries decoded as garbage. The xrspatial round-trip happened to work because both ends shared the same broken arithmetic. Fix: rework predictor_decode/predictor_encode on CPU and GPU to view the buffer as the matching unsigned dtype (uint8/16/32/64) in the file's byte order and difference at that resolution with stride = samples_per_pixel. The 8-bit byte path stays as a fast special case since it was already correct. Tests cover reads of tifffile-produced predictor=2 TIFFs across uint16/int16/uint32/int32 and multi-band uint16, writer output verified by tifffile reading it back, and GPU/CPU parity for the multi-byte cases.

The third-pass agent worked from a stale local main and re-derived the predictor=2 sample-wise fix that already shipped in #1498. That PR has been closed as a duplicate. Only the predictor=3 big-endian fix in #1500 is a genuine new finding.

github-actions Bot added the performance PR touches performance-sensitive code label May 6, 2026

Update sweep-accuracy state: geotiff deep pass 2026-05-05

e6b0009

brendancol merged commit ad259d7 into main May 6, 2026
11 checks passed

brendancol deleted the fix-predictor2-multibyte branch May 6, 2026 11:49

brendancol mentioned this pull request May 6, 2026

Decode TIFF predictor=2 sample-wise for multi-byte integer round-trip #1499

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do predictor=2 differencing at sample width, not byte width#1498

Do predictor=2 differencing at sample width, not byte width#1498
brendancol merged 2 commits intomainfrom
fix-predictor2-multibyte

brendancol commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 6, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant